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FAST MULTI-DIMENSIONAL SCATTERED DATA 
APPROXIMATION WITH NEUMANN BOUNDARY CONDITIONS 

DENIS GRISHIN AND THOMAS STROHMERt 



Abstract. An important problem in applications is the approximation of a function / from 
a finite set of randomly scattered data f(xj). A common and powerful approach is to construct 
a trigonometric least squares approximation based on the set of exponentials ^ e 2 ^ lkx y^ This leads 
to fast numerical algorithms, but suffers from disturbing boundary effects due to the underlying 
periodicity assumption on the data, an assumption that is rarely satisfied in practice. To overcome 
this drawback we impose Neumann boundary conditions on the data. This implies the use of co- 
sine polynomials cos(irkx) as basis functions. We show that scattered data approximation using 
cosine polynomials leads to a least squares problem involving certain Toeplitz+Hankel matrices. We 
derive estimates on the condition number of these matrices. Unlike other Toeplitz+Hankel matri- 
ces, the Toeplitz+Hankel matrices arising in our context cannot be diagonalized by the discrete 
cosine transform, but they still allow a fast matrix-vector multiplication via DCT which gives rise 
to fast conjugate gradient type algorithms. We show how the results can be generalized to higher 
dimensions. Finally we demonstrate the performance of the proposed method by applying it to a 
two-dimensional geophysical scattered data problem. 

Key words. Trigonometric approximation, nonuniform sampling, discrete cosine transform, 
Toeplitz+Hankel matrix, block Toeplitz+Hankel matrix, conjugate gradient method. 
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1. Introduction. An ubiquitous problem in mathematics and in applications is the recon- 
struction or approximation of a function / from its non-uniformly spaced sampling values Sj = f(xj). 
Without further knowledge about / this is an ill-posed problem, since the subspace of functions h 
with h(xj) = sj has always infinite dimension. Moreover in practice we are given only a finite 
number of samples {sj}j_ ly which makes a complete reconstruction of / in general impossible, so 
the best we can hope for is to compute a good approximation to /. Fortunately in many practical 
situations the functions under consideration are not arbitrary, but possess some smoothness proper- 
ties. For instance physics often implies that / is bandlimited. In this and many other cases a linear 
combination of trigonometric basis functions {e 2,rlkx }k£z often provides a good approximation to /. 
Other powerful models for scattered data approximation are based on radial basis functions and on 
shift-invraint systems 1191 . 

Least squares approximation using exponentials as basis functions provides a tool that is general 
enough to be useful in a variety of situations where smooth functions are involved, while the algebraic 
structure of the functions e 27!lkx is rich enough to give rise to fast and robust numerical algorithms 
to compute the approximation, cf. e.g. 1171 HI 151 . 

Arguably the main drawback of approximation by exponentials is the underlying periodicity 
assumption about the function to be approximated. To be more precise, let / be a smooth continuous 
function and let {f{xj)}^_ 1 be samples of / taken at the points x\ < ••• < x r . Without loss 
of generality we assume that X\ = and x r = 1. We want to approximate / on the sampling 
interval [x\,x r ) = [0,1) by a trigonometric polynomial p(x) = JZfcL— m Ck e ' 27 "' kx with M < r/2. If 
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/(O) = /(l) we can safely conclude from Weierstrass' theorem that a trigonometric polynomial of 
low degree will give a good approximation to / on the interval [0, 1). However if /(0) ^ /(l) then 
this difference is felt as discontinuity by the approximating polynomial p. In fact standard Fourier 
analysis tells us that the coefficients {c^jfegz of p will at best decay like o(I/fc), thus a large degree 
M is required to obtain a reasonable approximation to / on [0, 1). However since in practice only a 
finite number of samples is available we may not be able to choose M sufficiently large to obtain a 
satisfactory approximation to /. 

A standard method to enforce periodicity of / on [0, 1) is to multiply / with a smooth "window 
function" w which decays rapidly to zero at the boundaries of the sampling interval. However such 
a procedure can considerably reduce the interval in which the approximation is in agreement with 
the "non- windowed" sampling values f(xj). We could also try to reduce the unpleasant behavior 
caused by the boundary effects by choosing the period N of p slightly larger than the length of the 
sampling interval. Nevertheless, if |/(0) — /(1)| is large we still need a polynomial of large degree to 
obtain a reasonable approximation to / on [0, 1). We also note that boundary effects become worse 
with increasing dimension. 

Instead of extending / (respectively its samples f(xj)) periodically across the boundaries of the 
sampling interval, we can apply Neumann boundary conditions, i.e., a symmetric extension across 
the end points of the sampling interval. This has the big advantage that we avoid the discontinuity 
at the boundaries. The Fourier coefficients of a continuous (periodic) function decay at least like 
ol/fc and at best like ol/fc 2 . Thus loosely spoken, the decay is one order of magnitude faster than 
compared to a periodic extension. This faster decay implies that a lower polynomial degree should 
suffice to obtain a good trigonometric approximation. 1 

If we extend the sampling values fipcj) 1 ^^ symmetrically across the boundaries we obtain a 
sampling sequence that is periodic on the interval [0, 2) and symmetric with respect to the midpoint 
1. To adapt the trigonometric basis functions to this situation we have to replace the exponentials 
| e 27r2fex^^^ ^ ne bggjg functions {cos(7rfcx)} fe6N . The functions cos(itkx) are symmetric around 1 
and periodic with respect to the interval [0, 2). The advantage when using cosine polynomials instead 
of exponentials is obvious from the discussion above: we reduce disturbing boundary effects, which 
results in a better approximation of the original function. 

In the case of trigonometric approximation based on exponentials it has been shown that the 
least squares approximation can be formulated as hermitian positive definite Toeplitz system p£J. 
Grochenig has derived explicit bounds for the condition number of the Toeplitz matrix that allow to 
estimate the stability and convergence of the involved numerical algorithms QUI- Moreover all steps 
to compute and solve the Toeplitz system can be done quickly by (nonuniform) FFT-based methods. 

The crucial questions that we will investigate in this paper are: Does the least squares approx- 
imation problem using cosine polynomials also give rise to a linear system of equation whose matrix 
has a nice structure? Can we find fast and robust numerical algorithms to solve the least squares 
problem? Can we give a priori estimates on the condition number of the matrix? Can we generalize 
the algorithm easily to higher dimensions? How does our approach perform for real world problems? 
This paper is devoted to clarify these questions. 

The rest of the paper is organized as follows. In Section Q we analyze the least squares ap- 
proximation problem using cosine polynomials. We show that the resulting matrix has a certain 
Tocplitz+Hankel structure and derive estimates on the condition number of this matrix. In Sec- 
tion Q we present a fast algorithm to solve the least squares problem using the conjugate gradient 
method and the discrete cosine transform (DCT). The generalization to the multi-dimensional case is 
described in Section^ Finally in Section|S|we demonstrate the performance of the proposed method 
by applying it to a scattered data problem arising in geophysics. 

The idea of using Neumann boundary conditions instead of periodic boundary conditions has 
turned out to be very fruitful in the context of image deblurring problems. In fact, the research 

x This is exactly the reason why the (old) JPEG image compression algorithm uses the DCT 
instead of the DFT. 
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presented in this paper was inspired by the article A fast algorithm for deblurring models with 
Neumann boundary conditions by Michael Ng, Raymond Chan, and W.C. Tang 1131 . 

2. Nonuniform sampling, cosine polynomials, and Toeplitz+Hankel ma- 
trices. We start by defining the space Pm of cosine polynomials of maximal degree M as 

V : P(x) = + J2 c k cos(7Tkx), c = { Cfc }££ 6 M M+1 j . (2.1) 

There are two reasons for the introduction of the 1/ \/2-scaling factor of the coefficient Co in 12.11 . 
The first reason is that we have the Parseval type identity 

IHH= / b(aOI 2 ^ = f + ~£4 = ~Nll. (2.2) 

-oo fc=1 

The second reason is increased stability of the numerical algorithms we are going to derive, as we 
will explain in the remark after Theorem 12. II 

Let us return to the approximation problem. Given sampling points 2 {xj} r _ 1 and sampling 
values we want to solve the least squares problem 



i^b(ay)-*i[ : V (2-3) 



mm 

p&Pm . 

3=1 

Here the wj > are weights which the user may choose at her convenience. Often the trivial choice 
Wj = 1 is sufficient. In other cases it is useful to choose the weights such that they compensate for 
irregularities in the sampling set, i.e., smaller weights are used in regions with high sampling density 
and larger weights in regions with few sampling points. In 12.51 we have assumed that the polynomial 
degree M is fixed. We will discuss the important question of how to determine the appropriate degree 
of the approximating polynomial in Section 151 

By defining the r X (M + 1) Vandermonde-likc matrix V via 

{-h=^/un, for k = 0; j = 1, . . . ,r. 
y/Wj cos(irkxj), for k = 1, M; j = 1, . . . , r, 

and setting aC*") = {^/w 3 'S 3 -}J =1 we can reformulate the least squares problem 12.31 as 

min \\Vc- sMlli (2.5) 

It is well-known that the solution of 12.51 can be computed by solving the normal equations 

V T Vc = V T s^l (2.6) 

Switching to the normal equations can lead to problems of numerical instability due to the squaring 
of the condition number of V. However, as we will see, the system matrix of the normal equations 
has a very nice algebraic structure that paves the way to fast numerical algorithms for solving 12.31 . 
Thus to handle the trade-off between numerical stability and computational efficiency it is important 
to have an a priori estimate of the condition number of the matrix V. Such an estimate will aid 
us in the decision if we shall compute the least squares solution by a direct solution of the system 
Vc = s( w > or by switching to the system V T Vc = V T s^ w ' . 

The following theorem provides both insight in the algebraic structure of V T V and an upper 
bound of the condition number of V T V. 

Theorem 2.1. Assume we are given nonuniformly spaced sampling points {xj} r =1 £ [0,1], 
sampling values s = {sj}J =1 and positive weights {wj}^_ 1 . Define A := V T V, where V is as 
in |2~H . and set b = V T '«W, There holds: 

(i) The matrix A is a scaled Toeplitz+Hankel matrix of the form 

A = D(T + H)D, (2.7) 



2 Throughout the paper we will always assume that the sampling locations Xj are pairwise distinct. 
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where 



ao 
«i 



01 
ao 



O-M-l a M 
«M-1 



O-M-l 



ai 



«i 

ao 



H 



ao 
ai 



ai 

"2 



with 



1 r 

= — io j cos(7rfcXj), 



o.m-1 
a-u a,M+i 



,2M + 1, 



Q2M 
02M «2M+1 



(2.8) 
(2.9) 



and D = diag(-^=, 1, . . . , 1). 

(ii) If M < r then A is invertible and the coefficient vector c = {cfc}jc_ of the cosine polynomial 
p £ Pm that solves 12.31 is given by 



c = A- 1 b. 



(Hi) Define the weights Wj by 



^ j — 1 



where we set xq 



-XI, X r +1 '■= 2 — XV . // 



5 := max|x,_i_i — xA < — 
j 1 3+ 31 M 



then the condition number k(A) is bounded by 

k(A) < 



(1 + 8M)' 2 



(1 - 8M) 2 ' 
Proof, (i) Note that 

r 

Akj = {V T V) ki i = s k j ^2 w j cos(ttIxj) coa(Trkxj), 

3=1 

where 



k,l = 0, 



, M, 



x/2 
1 



if fc = and I = 0, 

if fc = or I = 0, fc ^ i, 

if fc > and 2 > 0. 



The result follows now readily from a simple calculation by applying the formula 

cos(a) cos(/3) = cos(q + 0) + cos(o — /3), 



(2.10) 



(2.11) 



(2.12) 



(2.13) 



(2-14) 



(2.15) 



(2.16) 



to 12.141 and using the fact that the entries of T and H satisfy ; = a^_j and ; = a/e+i 
respectively. 

(ii) The invertibility of A follows from the well-known fact that the Vandermonde-like matrix 
V has rank M + 1 for mutually different points Xj (assuming Wj ^ 0) . The rest follows from 12.61 . 

(iii) With the exception of a few minor modifications the proof of this part is similar to 
Grochenig's elegant proof on the upper bound of the condition number of certain Toeplitz ma- 
trices, see However instead of confronting the reader with a patchwork of required modifications 
of Grochenig's proof we prefer to present a complete proof. 
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The proof makes use of Wirtinger's inequality If / 6 L 2 (a,b) and either f(a) = or 
f(b) = 0, then 

b b 

J \f(x)\ 2 dx<-^(b-a) 2 J \f(x)\ 2 dx. (2.17) 

a a 

We proceed with the proof of (iii). Let P be the orthogonal projection of L 2 ([0, 1]) onto Pm- 
Define the operator S by 

r 

Sp = P^p(x j ) Xj ). (2.18) 

j=i 

Here Xj( x ) denotes the characteristic function of the interval [yj—i,yj], where j/j = ,+ \ — ~>j = 
1, . . . , r with xo = — xi, x r +i = 1 — x r . 
We compute 

lb-5p||| = ||P(^(p-p(^)) Xj )||l < ||$>-p(^))x,lll = 

3=1 3=1 

1 Bj 

r r X 

\Y J {P-P^ ] ))x ] \ 2 dx = Y J J Ip-Pixj^dx. (2.19) 
We write 

Vj Xj Vj 

J \P-P(xj)\ 2 dx = J \p- p(xj)\ 2 dx + J \p- p(x 3 )\ 2 dx, 

Vj~l Vj-l Xj 

and apply Wirtinger's inequality 12.171 to each of the integrals on the left-hand side. Since \yj—Xj | < 
8/2 and \xj — Vj—i\ < (5/2 we obtain 

Vj 1 
J2 [ \P~P(xi)\dx<^it I '\p'(x)\dx=^\\p'\\l (2.20) 



Note that 



M M 
p'(x) = c^irk sin(7rfcx), = c^irk sin(7rfcx). (2-21) 
fc=0 fc=i 



Hence we have the Bernstein type inequality 

M 



o'Wl 



I c fc 7rfc sin(7rfea;)| 2 (ix 



fc=l 



< (ttM) 2 / | <>k sin(nkx)\ 2 dx < (7rM) 2 ||p|| 2 . (2.22) 


Thus by combining |2~T51 , IT^HI and |2~2^1 we get 

l|p-5p|||<«5 2 A/ 2 ||p|||. (2.23) 

Hence 

\\I - S||o P < 8M, (2.24) 
and since S < 1/M by assumption, we conclude that 5 is invertiblc and 

||S -1 ||op < (1-SM)- 1 . (2.25) 



G 



There holds 



(1 - SM) 2 \\p\\l = (1 - SMfWS^Spf^ < 

r 

< (1 - SMfWS-^WSpWl < \\Spf 2 < bfe)l 2 ™i 



(2.26) 



i=i 



Also 



\P( x i)\ 2w 3 ^ \\P-P + ^P( x i)Xj\\l < (llPll + IIP-^PfoOXill: 

2=1 2=1 j=l 

<(||p||2+5Af||p|! 2 ) 2 <(l + 5M) 2 ||p||l. 



Thus 



(l-<5M) 2 ||p|| 2 < £ Ipfe)! 2 ^- < (l + SM) 2 \\p\\l 

2=1 

By definition we have for any p £ fjv/ with coefficient vector a 

r 

(Aa, a> = (V T Va, a) = (Va, Va) = ^ \p(xj)\ 2 Wj. 

2=1 

Using the relation = 5IMI 2 , we obtain 



1(1 -,5M) 2 ||a|j 2 < (Aa,a>< 1(1 + «5M) 2 ||a|| 2 , 



(2.27) 
(2.28) 

(2.29) 
(2.30) 



and therefore 



k(A) < 



(1 + <5M) 2 
(1 - <5M) 2 



Remark: We briefly analyze the least squares problem 12.51 when using non-scaled cosine poly- 
nomials p(x) = 2Z*Lo c k cos{-wkx). It is easy to see that the corresponding Vandermonde-like matrix 
V satisfies 



VD = V 



with D as in part (i) of Theorem 12. H and V as in <2TH . Hence 



A := V V = D~ L V L VD~ 



The estimates 



and 



imply that 



\\Ax\\ 2 < \\D~ 



Hop n^iiop if ii2 



x||2 < 2||A|| op ||z||2, 



jA-^Ha < ||D||2 p ||A|| p||a:||2 < ||A|| op ||x|| 2 , 



cond(A) < 2cond(A). 



(2.31) 

(2.32) 
(2.33) 

(2.34) 

(2.35) 



Thus the condition number of A can be twice as large as the condition number of A. This is why we 
prefer to use scaled cosine polynomials as defined in 12.11 . The inequality 12.351 is sharp as can be 
seen from the following simple example. Let the sampling points xj be equally spaced, and choose 
the weights Wj as in Theorem 12. II In this case it is not difficult to see that 



A=-I M +i, 

where Im+1 denotes the (M + 1) X (M + 1) identity matrix, whereas 

\2 ... 01 
,01 ... 



,0 



Thus obviously cond(A) = 2cond(A) in this case. 



(2.36) 



(2.37) 
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3. Fast computation of the least squares approximation. In this section wc 

present a fast algorithm for solving the least squares problem 12,31 . Our algorithm is based on the 
conjugate gradient method in connection with a fast matrix-vector multiplication involving the DCT. 
Before we proceed we briefly review some properties of the DCT-I. There are four types of the DCT, 
cf. 1221 . For our purposes we will use the (scaled) DCT-I. 

Definition 3.1. The Type-I Discrete Cosine Transform matrix (DCT-I for short) of size n x n 
is defined by 

{-=!== cos (vr-S-) if k = or k = n - 1, 
?as=s tf* = i,-..,n-2. 

If the dimension of the matrix C n is clear from the context we drop the subscript and simply write 
C instead. 

The DCT-I matrix C satisfies CC = /. It is not unitary, but can be easily made unitary 
by appropriate scaling. For define the diagonal matrix D = diag([l, \/2, . . . , \/2, 1]) and set C = 
D—XCD. Then it is easy to see that CC T = I. In some cases it is more convenient to work with C 
instead of C 1111 . However the results presented in this paper can be more elegantly expressed when 
using the definition 13.11 of the DCT-I. Fast algorithms for computing Cx require 2.5 0(n log n) 
operations if a; is a vector of length n + 1 and n is a power of two 1251 . cf. also 12 1 1 IT) 

It is well-known that the DCT-I matrix diagonalizes certain Toeplitz+Hankel matrices I1HII1 II . 
For let T = toep(a) be a symmetric Toeplitz matrix with first column a = [oq, a\, ■ ■ . , a n ] T . We 
define the counter-identity matrix J by 







(3.2) 



If 

B = tocp(a) + Jtoep(Ja) := T + H (3.3) 

(note that J toep( Ja) is a Hankel matrix that is symmetric with respect to the counter diagonal) 
then 

C T BC = E, where S is a diagonal matrix. (3-4) 

An important consequence of this diagonalization property is that the multiplication of a matrix 
B of the form 13.31 with a vector x can be carried out in O(nlogn) operations via DCT-I [I], similar 
to the multiplication of a vector by a Toeplitz matrix which can be computed via FFT by embedding 
the Toeplitz matrix into a circulant matrix. 

To be precise, assume we want to compute y = Bx where C T BC = S. There holds 

y = Bx = C T C T BCCx = C T T,Cx. (3.5) 

Of course in a numerical implementation we would not compute the diagonal matrix S explicitly. 
Instead we proceed as follows. Let b be the first column of B, define the scaling matrix D± = 
diag(2, 1, . . . , 1, 2) and observe that C = D~ 1 C T D\. A simple calculation shows that D~ 1 S = 

diag(C T 6). Hence 



y = Bx = C T D 1 D~ 1 T,Cx = yj -^—C T diag(DiC T 6)Df 1 C T Dix, 

and therefore 



y = ^^C T [(C T b)o(C T D lX ])], (3.6) 

where the operation "o" denotes the pointwise product between vectors. Hence the product Bx can 
be computed by three DCT-I's in 0(n log n) operations. 
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Observe that the Toeplitz+Hankel part of the matrix A = D(T + H)D in |2~71 of Theorem I5T1 
is not of the form 13,31 . since the first row and the last column of the Hankel matrix H in 12,81 
have different entries. Thus A is not diagonalized by the DCT-I (or any other DCT). But we can 
embed the Toeplitz+Hankel part of A in a Toeplitz+Hankel matrix of the form 13.31 . similar to the 
embedding of a Toeplitz matrix in a circulant matrix. To see this, let T and H be defined as in 12.81 . 



We embed T - 
where 



H in the (2M + 1) X (2M + 1) augmented Toeplitz+Hankel matrix T aug + H & 



Ha 



The matrix T + H is the (M - 



a.M+1 

. a 2M 
a 

a M 
a M+l 

- a 2M 
1) X (M - 



a M a M+l 



a M+l a AI 
a M a M+l 



"2M 



O-M-l 



a 
a 2M 



a M 
«M-1 



(3.7) 



(3.8) 



a M+l a M ■ ■ ■ a 
1) principal leading submatrix of T aug - 



H a 



Thus for a DCT-I based fast implementation of the matrix vector product Ax we proceed 



as follows. We write y = Ax = D(T + H)Dx and define x a 



{(Dx) T ,0,...,0] T . Compute 



J/aug = ^aug^aug according to 13. bl . The vector y is then given by the first M + 1 entries of J/aug 
multiplied by D. 

In order to obtain augmented matrices whose size is 2™ + 1 we can always insert as many zeros 
as necessary after (i2M in the first row of T aug and H aug without destroying the algebraic structure 
of the matrices. Thus the matrix vector multiplication Ax can always be carried out in 0(M log M). 
This zero-padding is similar to the zero-padding of the Toeplitz case (where the zeros are added in 
the middle of the first row). 

Note that a direct computation of the entries of the matrix A and of the right hand side 
b will take O(Mr) operations. Thus, although we can solve the system Ax = b in O(AflogAf) 
operations, the computation of the entries of A and b will soon become the bottleneck for large scale 
problems. Fortunately there exist fast algorithms for computing sums of the form 12.91 . In 1141 
Daniel Potts has developed fast algorithms for computing the DCT for nonuniformly spaced points. 
Like nonuniform FFT algorithms 1151 a nonuniform DCT-I (NDCT for short) can be computed in 
0(ctM log(aM) + mr) operations, where a and m are constants. See 1141 for details. 

Based on the observations above, we propose the following fast algorithm for solving the least 
squares problem 12.31 . 

Algorithm 1 (Fast scattered data approximation using cosine polynomials). 
Input: Nonuniformly spaced sampling points {xj}J =1 6 [0,1], sampling values {sj}j_ lt weights 
{u>j} r =1 and user-defined points {ti}f_ Q S [0, 1]. 

Task: Compute the coefficients of the cosine polynomial of degree M that solves 12.31 and evaluate 
the polynomial at the points {t[}f'_ 1 . 

Step 1: Compute the first column of A in 12.71 and the right hand side b = V T s^ w ' via NDCT. 
This takes 0(ceMlog(aM) + mr) operations, where a and m are (small) constants. 
Step 2: Solve Ac = b iteratively by the conjugate gradient method. Using fast matrix-vector multi- 
plication this can be done in O(MlogAf) operations per iteration. 
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Step 3: Evaluate p(x) = + 2fc=i c k cos{yrkx) at the points {t;}^_ . If t t = l/L and L = 2™ for 
some n S N, then this can be done by a DCT in O(LlogL) operations. If L ^ 2™ j«e can use a /asi 
radix-p DCT, see lait . If the ti are nonuniforraly spaced we use a NDCT to compute {p(t;)}^_ 1 . 
Output: Least squares approximating polynomial p of degree M , evaluated at the points {ti}/Li. 

Remark: If the sampling set satisfies the maximal gap condition 12,121 and the weights are 
chosen according to 12,111 we can utilize the bound on re (A) in |2"T51 of Theorem |0] to estimate 
the rate of CG using the standard formula :5> 



where c'"' denotes the solution after the n-th iteration of CG applied to Ac = b. 

If the condition number of A is large (whether or not the maximal gap condition is satisfied) 
it may be better to solve the least squares problem 12.31 Vc = b without explicitly establishing the 
normal equations. One can resort to "non-symmetric" versions of CG such as GMRES or LSQR, 
cf. 0. Since the NDCT provides a fast way to carry out the multiplication of the matrix V with 
a vector we still obtain a fast algorithm. However the computational costs are in general larger 
than those for Algorithm since a NDCT is more expensive than a DCT and the NDCT has to be 
applied in each iteration, whereas in Algorithm it has to be applied only in the initial stage of the 
algorithm. 

If the matrix A is ill-conditioned due to large gaps in the sampling set one might be tempted to 
apply one of the cosine-transform based preconditioners to improve the situation. However precon- 
ditioners cannot significantly improve the stability in this case. This can be shown in a similar way 
as it is done in Section 4.2 of 24 for trigonometric approximation using exponentials. 

There exist fast direct methods to solve Toeplitz+Hankel systems (not all of them apply to 
our situation though), see [121 and in particular the work of Heinig [111 1101 . But many of these 
solvers require that the matrix dimension is a power of two. It is possible to overcome this severe 
constraint, however at the cost of a more involved algorithm. As we have seen for the conjugate 
gradient iterations the initial size of the matrix does not play a major role, since when constructing 
the augmented matrix we can always insert the appropriate number of zeros to get a size of a power 
of two. Furthermore, if the set of sampling points is a jittered version of a set of regularly spaced 
points, standard perturbation theory implies that the eigenvalues of A will be clustered around 1. 
Thus CG will converge in very few iterations. Direct solvers cannot take advantakc of such sitations. 

3.1. Multilevel scattered data approximation. The reader may have noticed 
that we have tacitly assumed that the polynomial degree M is given a priori. Although this is a 
common assumption in polynomial approximation it is not justified in many applications. In fact, 
the appropriate choice of M has a major influence on the usefulness of the resulting approximating 
polynomial, cf. [23]. In [3J Ot mar Scherzer and the second author have developed a multilevel scheme 
that automatically adapts to the solution of the optimal "level", i.e., the optimal polynomial degree 
in our case. This multilevel algorithm applies to our approximation method without modification. 

In a nutshell the multilevel version of Algorithm^ works as follows, for details we refer to [20 8 . 
We start at the first level with an initial choice for the approximating polynomial (e.g., Mo = 1) 
and apply Algorithm^ We stop the CG iterations when a specific stopping criterion is satisfied and 
obtain the approximation pi , say. Then we proceed to the next level by choosing a degree Mi > Mo 
(e.g., Mi = Mo + 1). We use the approximation pi from the previous level as initial guess for the 
solution at the new level and apply Algorithm We proceed through increasing levels until at the 
fc-th level the approximating polynomial p^ satisfies the discrepancy principle 



where e is a parameter related to the accuracy of the given data Sj. 

A fast O(MlogAf) implementation of the multi-level scheme for cosine polynomials can be 
derived in a similar way as it is done for the exponentials, see Algorithm 2 in Section 5.1 of An 




r 



r 




(3.10) 
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crucial observation thereby is that the scaled Toeplitz+Hankel matrix Am associated with the least 
squares problem 12,51 for degree M is related to the matrix Am+i associated with the least squares 
problem 12,51 in a nice way. Namely, Am is the principal leading submatrix of Am+i- 



Remark: Finding the optimal level for the approximating function is a common and important 
problem in scattered data approximation. When using radial basis functions or shift-invariant sys- 
tems as model one has to deal with the trade-off between accuracy and stability when determining 
the width of the basis functions, cf. e.g. 1191 . The multi- level idea provides a natural framework to 
handle this trade-off. 

4. Two-dimensional scattered data approximation. Many of the results of the 

previous sections can be extended to arbitrary dimensions. For the sake of simplicity of notation we 
will focus mainly on the two-dimensional case. 

We are given sampling values s = {■Sj}^_ 1 and randomly spaced sampling points {(xj, yj)} T j_^- 
Without loss of generality we assume that (xj,yj) £ [0, 1] X [0, 1], otherwise we can always renormalizc 
the sampling points accordingly. 

The space Pm x m v consists of two-dimensional cosine polynomials p of degree M x M y defined 

by 

p(x,y) = —j= + ^2^2 Ck,lcos(irkx)cos(irly), (4.1) 
V 2 k=0 1=0 

max{ k ,1} >0 

with real- valued coefficients c/t 

Analogous to the one-dimensional scattered data problem we want to find the p £ Pm x m y that 
solves 

r 

min^ \p(xj, yj) - Sj\ 2 Wj. (4.2) 
3=1 

We define the block matrix V by 

V=[V(°) VW ... , (4.3) 

with Vy2 = £k iy/Wj cos(irkxj) cos(irlyj), j = 1, . . . , r, (4.4) 

f if k = and I = 0, 

where e fe j = < v2 (4.5) 
1 1 if k = 0, . . . , M x ;l = 0, . . . , M s ;max{fc, /} > 0. 

By stacking the columns of c and with a slight abuse of notation we can rewrite 14.21 as 

min||Vc- s^\\, (4.6) 

where s^ w ' = {^/uijSjj^j^. 

Similar to the 1-D case, we can solve 14.61 by switching to the normal equations. The next 
theorem describes the algebraic structure of the system matrix of the normal equations. 

Theorem 4.1. Let V be as defined in I4.3I - |4~H . Then the matrix A := V T V is a scaled block 
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Toeplitz+Hankel matrix of the form A = D(T + H)D with 

... ^(A-fy-l) 



H : 



A {My) 

A(°) A« 
A(i) A« 



A (M y ) 



a( m ») 

^(M„-l) 



A(°) 
A( M v) 

A( m »+!) 



(4.7) 



A (2M y -l) 
A (2M y -l) A( 3M v) 



(4.1 



where each block A^ k \ k = 0, . . . , 2A/ y is an X (M^+l) matrix of the form A^ = +H<- k '> 

with T(k) and ffW as in i2~8l and D = diag(^, 1, . . . , 1). 
Proof. It follows from 14,31 and 14.41 that 

r 

A l,l',k,k' = £k,l£k',V ^2 w i{ cos ( 7r ( l + l')xj)cos(n(k + k')yj) + cos(tt(/ - l')xj) cos(n(k + k')yj)+ 



j'=l 

+ cos(7r(i + l')xj) cos(-7r(fc — k')yj) + cos(7r(( — l')xj) cos(7r(fc — k')yj)j ■ (4-9) 

Here the indices Z, /' refer to the (I, Z')-th block of A and the indices k, k' refer to the element in the 
fc-th row and fc'-th column in a certain block. 

Now we consider the entries of A for fixed I and V . Using formula 12.161 we calculate 

r 

A l,l',k,k' = s k ,l£k',l' ™j(ci[cos(7r(fc + &')%) + cos(7r(fc - fe')%)] + 

3=1 

+C2 [cos(-7r(fc + k')yj) + cos(7r(fc — k')yj)] \ , k, k' = 0, . . . , Ms, 

(4.10) 

where the constants c\ and C2 are given by c\ := cos(-7r(i + l')xj), C2 := cos(-7r(Z — l')xj). Thus the 
(I, l')-th block of A is indeed of the form 12771 . 

By repeating this step with reversed roles for k, k' and I, I' we see that the "global" structure 
of A is of the form 14.81 . □ 

In order to utilize the block Toeplitz+Hankel structure of the normal equations we have to 
extend the fact that the DCT-I diagonalizes certain Toeplitz+Hankel matrices to the case of block 
Toeplitz+Hankel matrices. 

We need some preparation before we proceed. Let B be a block matrix of the form 

£(0,0) _ #(0,n-l) 



B (n-1,0) 



(4.11) 



where the blocks are matrices of size m X m. For such block matrices we define the mod-m 

permutation matrix n mi „ via 



[n m , n Bn^ in ] lJ;M = B kMi<j , < i,j < m - 1, < k,l < n- 1. 



(4.12) 



In words, the (i, j')-th entry of the (fe, i)-th block of B is permuted to the (k, l)-th entry of the (i, j)-th 



block. We have U r 



J-m,n 11 n } m ' 

Definition 4.2. The two-dimensional type-I Discrete Cosine Transform of an m x n 
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is given by 



771 — 1 71 — 1 



V2m - 2V2n - 2 



i = 0, . . . , m — 1; j = 0, . . . , n — 1, 



(4.13) 
(4.14) 



where 



1 i/i6{0,m-l} and j e {0, n - 1}, 

2 i/i e {0,m - 1} and j £ {0,n - 1}, 
2 i/i g {0,m - 1} and j G {0, n - 1}, 

4 i/i = l, , m — 2 and j = 1, . . . ,n — 2. 



The two-dimensional DCT-I can be represented by the ran X mn matrix C m ®C n where the matrices 
C m and C n represent one-dimensional DCT-I's as in definition ^. H and ® denotes the usual Kronecker 
product. 

Similar to the 1-D DCT-I the 2-D DCT-I diagonalizes certain block Toeplitz+Hankel matrices. 

Theorem 4.3. A matrix B is diagonalized by a two-dimensional DCT-I if and only if B is of 
the form 



BP) S(°) 



B(«-i) 



B (n-2) B (n-1) 

B(""2) 



B(0) 



' B(°) B« 

bM b( 2 ) 

Bt™" 1 ' 



B (n-2) B {n-1) 

B<™- 2 ) 

b(°) 



(4.15) 

where each block B' fc ) , fc = 0, . . . , n — 1 is amxm Toeplitz+Hankel matrix of the form 13.31 . 

Proof. The proof is similar to the proof of Theorem 3.3 in |13| and uses basic properties of the 
Kronecker product ®. Let B be a block Toeplitz+Hankel matrix as in the assumption of the theorem. 
We have to show that B is diagonalized by the two-dimensional DCT-I C = C n ®C m . Note that 
each block B< fc ) of B can be diagonalized by a one- dimensional DCT-I C m , i.e., CjBWC m = A< fc ), 
k = 0, . . . n — 1, where the A( fc ) are m X m diagonal matrices. Since C„®C m = {Cn®I m )(I n ®C m ) 
it follows that 

(C*„(g.C m ) T B(C„®C m ) = (C^®/ m )(/n®C£)B(/„®C m )(C„®/m) = (C^®/ m )A(C„®7 m ), (4.16) 



where 



A : 



AC) A« 
AM AC) 



At™- 1 ) 
We compute 



A (n-2) A(ra-l) 
A (n-2) 



AC) 



■ AO AW 
AM AC) 



A (n-2) A 0-l) 
At"" 2 ) 

AC) 



n m „An — b 



bC) o 

B« 






o bC™- 1 ) 



(4.17) 



(4.18) 
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where is an n X n zero matrix. It follows from 14,171 that each _B( fc ) , k = 0, . . . , m — 1 is an n X n 
Toeplitz+Hankel matrix of the form i3~51 . Therefore C^B^d = A< fc ', fe = 0, . . . ,m - 1. 
Since U m>n (C^ ^I m )Tlf nn = J m <g>C£ (e.g., see we have 

(cT®i m )A(c„®i m ) =n m ,„n^ i „(c^®j m )n^ i „n m ,„An^ in n m ,n(c„®/ m )n m ,„n^ >n 
=n^ >n (Jm®c^)s(/ m ®c^)n m , n 

=n^ in An m ,„, (4.19) 

where A is a block diagonal matrix with diagonal blocks A'*). Thus A is a diagonal matrix. It follows 
from the definition of Tim,n that If^ n AlI rajn is then also a diagonal matrix. 

The opposite direction follows from the fact that CC = I. □ 

The matrix A associated with the least squares problem 14.61 is not diagonalized by the 2-D 
DCT-I. But analogous to the 1-D case, A can be embedded into a block Toeplitz+Hankel matrix 
that is diagonalized by the 2-D DCT-I. Thus similar to the 1-D case the matrix-vector multiplication 
Ax can be carried out in 0(M x M y log M x M y ) operations. 

We leave it to the reader to extend Theorems 14. II and 14.31 and the fast matrix- vector multipli- 
cation to dimensions larger than two. Since the NDCT can also be generalized to two and higher 
dimensions we have a fast numerical algorithm for computing the least squares approximation using 
cosine polynomials in multiple dimensions in the same way as it is outlined in Algorithm HI 

Remark: There is one notable difficulty that arises when considering the scattered data ap- 
proximation problem in higher dimensions. In the 1-D case a sufficient condition for invertibility 
of the matrix A is that the polynomial degree M is smaller than the number of samples r. This is 
an immediate consequence of the fundamental theorem of algebra. Unfortunately the fundamental 
theorem of algebra does not extend to the multi-dimensional case. It is obvious that a necessary 
condition for the existence of A" 1 is M < r. However this condition is no longer sufficient, since 
the sampling points need not be appropriately distributed. In higher dimensions, the zero set of a 
polynomial is an algebraic curve or an algebraic surface. For A to be invertible, the samples must not 
be contained in any algebraic surface. It is an open problem to efficiently characterize all sampling 
sets that yield an invertible matrix A. 

It is still possible to obtain conditions that guarantee the existence of A -1 as well as to derive 
estimates for the condition number of A in the multi-dimensional case. This can be done for instance 
by adapting the approach in Section 4.3 of |S| to our situation. However the estimates are no longer 
sharp and get worse with increasing dimension. We do not pursue this direction here. 

5. Numerical experiments: An example from geophysics. We demonstrate 

the performance of the proposed algorithm by applying it to a scattered data problem from geo- 
physics. Exploration geophysics relies on measurements of the Earth's physical properties like the 
magnetic or gravitational field, with the goal of detecting anomalies which reveal underlying geo- 
logical features. In geophysical practice, it is essentially impossible to gather data in a form that 
allows direct interpretation. Geoscientists, used to look at their measurements on maps or profiles 
and aim at further processing, need a representation of the originally irregularly spaced (scattered) 
data points on a regular grid. The reconstruction or approximation of potential fields on regular 
grids from scattered data is thus one of the first and crucial steps in the analysis of geophysical data. 

As test example we use a synthetic anomaly / that represents the gravitational acceleration 
caused by an ensemble of buried rectangular boxes of different size, depth, and density contrast, 
see Fig. |3a). This example has also been used in 1161 . We sample this function at 496 randomly 
spaced points (xj,yj) in the interval [0,1] X [0,1]. Since in practice measurements are always con- 
taminated by noise we add white Gaussian noise in the amount of 5% of the £ 2 -norm of the samples 
f(xj,yj). We want to reconstruct the function on a regular grid T consisting of the grid points 
{(fc/150,«/150)}i50 =o . 

In order to demonstrate the advantage of using Neumann boundary conditions over periodic 
boundary conditions we compare the proposed algorithm to the so-called ACT method l-H 151. The 
latter has become a main ingredient for several approximation methods in geophysics 16 2]- We 
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also include in the comparison the approximation obtained by cubic spline interpolation, which we 
computed via the MATLAB function griddata using the option 'cubic'. 

For the two methods using trigonometric approximation we use the same number of coefficients 
for the approximating polynomial. We use a total of 11 coefficients in the x-coordinate and the same 
number in the y-coordinate, resulting in approximating polynomials of degree 121 for both methods. 

Since we know the original anomaly / we can compute the error between the approximation 
fa and / via e(f a ) = ||/(r) — /a(r)||2/||/(r)|| on the grid T. The proposed method gives an error 
of 0.029, the ACT method yields approximation error 0.072, and the approximation computed via 
cubic splines returns an error of 0.045. The approximation computed by the proposed method is 
appealing both from a visual and from an approximation error viewpoint. 

The significantly larger error of ACT is only due to boundary effects. We note that there 
are several ways to improve the performance of the ACT method, see 1161 . which makes it indeed a 
powerful approximation method in geophysics Ilbll2*l , Since all these modifications can also be applied 
to the proposed method we expect that the proposed (modified) algorithm will still be significantly 
better than the modified ACT method. 

The results of this experiment do not mean that the proposed method always performs better 
than the other two methods. Furthermore, a detailed comparison of various scattered data approx- 
imation methods would have to include other standard methods such as approximation by radial 
basis functions. Such a comparison is beyond the scope of this paper. 
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